Replace ONNX simplification package from onnxsim to onnxslim #478

inisis · 2025-10-28T11:49:50Z

What does this PR do?

Type of change:

Add onnxslim support

Overview: Onnxslim is under active development and committed to long-time-support, it's easy to use and is dependent on very few packages.

Usage

$ python -m modelopt.onnx.quantization --onnx_path=$MODEL_NAME.onnx --simplify

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes/No
Did you write any new necessary tests?: Yes/No
Did you add or update any necessary documentation?: Yes/No
Did you update Changelog?: Yes/No

Additional Information

copy-pr-bot · 2025-10-28T11:49:54Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

setup.py

inisis · 2025-11-01T02:14:37Z

@gcunhase Hi, any update here? Thanks.

gcunhase · 2025-11-03T22:24:44Z

@gcunhase Hi, any update here? Thanks.

@inisis Thank you for your contribution. I'm doing some investigation on any onnxsim vs onnxslim gaps. Will get back to you as soon as possible.

gcunhase · 2025-11-06T17:45:51Z

@inisis I'm still validating onnxslim on our end, but in the meanwhile, could you please check that switching to onnxslim doesn't break quantization of https://github.com/NVIDIA/DL4AGX/tree/master/AV-Solutions/bevformer-int8-eq?

Specifically, please check that the following CLI is still functional and performant:

$ python -m modelopt.onnx.quantization --onnx_path=/mnt/models/bevformer_tiny_epoch_24_cp2_op13.onnx \
      --trt_plugins=$PLUGIN_PATH \
      --op_types_to_exclude MatMul \
      --calibration_data_path=/workspace/BEVFormer_tensorrt/data/nuscenes/calib_data.npz \
      --simplify

Thanks!

inisis · 2025-11-11T11:14:57Z

Hi, @gcunhase it took me some time to run bevformer-int8-eq, however everything is working fine, here are the results,

Env

device: NVIDIA GeForce RTX 5090
pytorch-quantization      2.2.1
torch                     2.9.0+cu128
torchvision               0.24.0+cu128
onnx                      1.17.0
onnx_graphsurgeon         0.5.8
onnx-ir                   0.1.12
onnxconverter-common      1.16.0
onnxruntime-gpu           1.20.2
onnxscript                0.5.6
onnxsim                   0.4.36
onnxslim                  0.1.74

Without simplify

With onnxsim

With onnxslim

to conclude:

Method	FPS	Acceleration Ratio
Without Simplify	354	1.00×
With onnxsim	371	1.05×
With onnxslim	381	1.08×

Well, in terms of GPU Compute Time (median, ms), onnxsim is slightly faster, I compared two models using

onnxslim --inspect /mnt/models/bevformer_tiny_epoch_24_cp2_op13.quant_sim.onnx /mnt/models/bevformer_tiny_epoch_24_cp2_op13.quant_slim.onnx

+------------------------------+------------------------------------------+------------------------------------------+
|          Model Name          | bevformer_tiny_epoch_24_cp2_op13.quant_s | bevformer_tiny_epoch_24_cp2_op13.quant_s |
|                              |                 im.onnx                  |                 lim.onnx                 |
+------------------------------+------------------------------------------+------------------------------------------+
|          Model Info          |       Op Set: 13 / IR Version: 10        |       Op Set: 13 / IR Version: 10        |
+------------------------------+------------------------------------------+------------------------------------------+
|          IN: image           |       float32: (1, 6, 3, 480, 800)       |       float32: (1, 6, 3, 480, 800)       |
|         IN: prev_bev         |         float32: (2500, 1, 256)          |         float32: (2500, 1, 256)          |
|       IN: use_prev_bev       |              float32: (1,)               |              float32: (1,)               |
|         IN: can_bus          |              float32: (18,)              |              float32: (18,)              |
|        IN: lidar2img         |          float32: (1, 6, 4, 4)           |          float32: (1, 6, 4, 4)           |
|        OUT: bev_embed        |         float32: (2500, 1, 256)          |         float32: (2500, 1, 256)          |
|     OUT: outputs_classes     |         float32: (6, 1, 900, 10)         |         float32: (6, 1, 900, 10)         |
|     OUT: outputs_coords      |         float32: (6, 1, 900, 10)         |         float32: (6, 1, 900, 10)         |
+------------------------------+------------------------------------------+------------------------------------------+
|             Add              |                   318                    |                   185                    |
|             Atan             |                    1                     |                    1                     |
|             Clip             |                    26                    |                    26                    |
|            Concat            |                    16                    |                    16                    |
|             Conv             |                    55                    |                    55                    |
|             Cos              |                    1                     |                    1                     |
|       DequantizeLinear       |                   175                    |                   393                    |
|             Div              |                    67                    |                    67                    |
|            Gather            |                    14                    |                    14                    |
|             Gemm             |                    7                     |                   140                    |
|           Greater            |                    3                     |                    3                     |
|             Less             |                    2                     |                    2                     |
|             Log              |                    15                    |                    15                    |
|            MatMul            |                   142                    |                    11                    |
|             Max              |                    1                     |                    1                     |
|           MaxPool            |                    1                     |                    1                     |
|             Mul              |                    81                    |                    81                    |
| MultiScaleDeformableAttnTRT2 |                    12                    |                    12                    |
|             Pow              |                    41                    |                    41                    |
|        QuantizeLinear        |                   175                    |                   393                    |
|          ReduceMean          |                    81                    |                    81                    |
|          ReduceProd          |                    1                     |                    1                     |
|          ReduceSum           |                    4                     |                    4                     |
|             Relu             |                    96                    |                    96                    |
|           Reshape            |                   105                    |                   269                    |
|          RotateTRT2          |                    1                     |                    1                     |
|          ScatterND           |                    58                    |                    58                    |
|           Sigmoid            |                    18                    |                    18                    |
|             Sign             |                    2                     |                    2                     |
|             Sin              |                    1                     |                    1                     |
|            Slice             |                    84                    |                    84                    |
|           Softmax            |                    5                     |                    5                     |
|            Split             |                    1                     |                    0                     |
|             Sqrt             |                    40                    |                    40                    |
|           Squeeze            |                    1                     |                    1                     |
|             Sub              |                    59                    |                    59                    |
|             Tile             |                    6                     |                    6                     |
|          Transpose           |                    36                    |                    36                    |
|          Unsqueeze           |                    30                    |                    30                    |
|            Where             |                    5                     |                    5                     |
+------------------------------+------------------------------------------+------------------------------------------+
|          Model Size          |                158.77 MB                 |                158.90 MB                 |
+------------------------------+------------------------------------------+------------------------------------------+

Onnxslim will merge Matmul + Add into Gemm, this is not in favor when using --op_types_to_exclude MatMul

gcunhase · 2025-11-11T15:18:02Z

Add | 318 | 185 |
| Atan | 1 | 1 |
| Clip | 26 | 26 |
| Concat | 16 | 16

Hi @inisis thanks for validating this functionality. Were you also able to validate the numerical accuracy for the onnxslim simplified model?

I will also do some investigation on the MatMul+Add vs Gemm substitution on my end in the meanwhile.

Thanks!

inisis · 2025-11-11T15:21:53Z

Add | 318 | 185 |

| Atan | 1 | 1 |

| Clip | 26 | 26 |

| Concat | 16 | 16

Hi @inisis thanks for validating this functionality. Were you also able to validate the numerical accuracy for the onnxslim simplified model?

I will also do some investigation on the MatMul+Add vs Gemm substitution on my end in the meanwhile.

Thanks!

@gcunhase I didn't use the full dataset from nuscenes, it's too big, I used the mini one to do the calibration. If this counts, I can verify it on the mini one.

gcunhase · 2025-11-11T15:32:22Z

Add | 318 | 185 |

| Atan | 1 | 1 |

| Clip | 26 | 26 |

| Concat | 16 | 16

Hi @inisis thanks for validating this functionality. Were you also able to validate the numerical accuracy for the onnxslim simplified model?
I will also do some investigation on the MatMul+Add vs Gemm substitution on my end in the meanwhile.
Thanks!

@gcunhase I didn't use the full dataset from nuscenes, it's too big, I used the mini one to do the calibration. If this counts, I can verify it on the mini one.

No problem, let me try to verify the accuracy on my end. Thank you!

inisis · 2025-11-18T00:01:42Z

Hi @gcunhase , is there any update? Thanks

gcunhase · 2025-11-21T23:39:18Z

@inisis we appreciate your contribution and wanted to make sure that there are no regressions before merging this PR. We've investigated potential risks in ~150 models and compiled a list of issues, divided into 3 categories, that would need to be solved before merging.

All mentioned models and scripts are in the zip file: repro.zip

1. Functional failures

Error logs

Error 1: repro_io_tensors_shape_dtype.onnx

Graph input and output tensors must include dtype information. Please set the dtype attribute for: Variable (NMS): (shape=None, dtype=None))

Error 2: repro_mode_error_mobilenetv1.onnx

Fail - onnxSLIM (onnxSLIM: 'mode')

How to repro

import onnx
import onnxslim

model = onnx.load(input_model_path)
simplified_model = onnxslim.slim(model)

2. ORT inference failures

Error logs

Error 1: repro_mul_incompatible_dimensions.onnx

Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from repro_mul_incompatible_dimensions.onnx failed:Node (/stages.1/stages.1.0/Mul) Op (Mul) [ShapeInferenceError] Incompatible dimensions

Error 2: repro_gemm_invalid_shape.onnx

Fail: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Non-zero status code returned while running Gemm node. Name:'/transformer/decoder/layers.0/attentions.1/attn/Gemm' Status Message: Gemm: Invalid bias shape for broadcast

How to repro

Run the check_ort_failures.py python script (update input_model_path as needed).

3. ORT numerical accuracy failures

Error logs

The simplified versions of the following models do not produce the same outputs as the original model for the same input data:

issue3_repro_conv_bn_fusion.onnx
- WAR: skip_fusion_patterns=["FusionConvBN"]
issue3_repro_conv_resize_issue.onnx
- WAR: none found.

How to repro

Run the check_ort_failures.py python script (update input_model_path as needed).

--
Please let us know if there's any additional questions on any of the items.
Thanks!

modelopt/onnx/quantization/quantize.py

inisis · 2025-11-22T10:27:51Z

@gcunhase So much appreciation for your comprehensive testing, which has helped us improve onnxslim. All the issues you mentioned have been resolved in version 0.1.75 of onnxslim, and these models have also been added to onnxslim’s daily CI. Many thanks again.

Here are some details when solving the issues:

1. Functional failures

If model is ended with custom opertor as output, onnxslim is unable to do symbolic shape inference for it, so it will lose dtype and shape, we improved it by using the info already stored in the original model.
but users can provide custom shape inference logic for theirs own function, onnxslim supports it and has a template for it.

2. ORT inference failures

In onnxslim the shape inference for the outputs of resize node is aligned with official onnx documentation
https://onnx.ai/onnx/operators/onnx__Resize.html#summary
in the official doc, the output size is floored

output_dimension = floor(input_dimension * (roi_end - roi_start) * scale)

where is onnxruntime, the output size if rounded,
https://github.com/microsoft/onnxruntime/blob/977efe4788b2ee24371523b5fa14dd02efcd4942/onnxruntime/core/providers/cpu/tensor/upsample.cc#L70

so there is a mismatch, and in some cases, there will be an incompatible_dimensions issue, now we are aligned with ort.

3. ORT numerical accuracy failures

there is a precision issue with issue3_repro_conv_resize_issue.onnx
in check_ort_failures.py, it uses np.array_equal, which is very strict, I check the maximum diff which is 3.5762787e-07, and if tested with

opts.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_EXTENDED

the np.array_equal is passed,
so I guess there maybe some ort optimizaion which result in this numerical diff.

gcunhase

@inisis we appreciate your speedy and detailed reply!

I was able to verify that all cases now pass with v0.1.75 and that disabling layout optimizations in ORT solves the numerical accuracy issue observed in the last model. This is achieved by adding the following line in our comparison script (as you suggested):

session_opts.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_EXTENDED

Approved.

setup.py

modelopt/onnx/quantization/quantize.py

setup.py

gcunhase · 2025-11-25T19:14:18Z

@kevalmorabia97 can you please update the CHANGELOG file? Not sure which ModelOpt version this update would be included.

My suggestion would something like: Replace ONNX simplification package from 'onnxsim' to 'onnxslim'.

Thanks.

setup.py

kevalmorabia97

Thanks for your contribution. Great to have better onnx simplification package! Will wait for CICD to pass and then merge

codecov · 2025-11-26T11:10:39Z

Codecov Report

❌ Patch coverage is 33.33333% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.78%. Comparing base (768ee6a) to head (3b1e46c).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
modelopt/onnx/quantization/quantize.py	33.33%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #478      +/-   ##
==========================================
+ Coverage   74.76%   74.78%   +0.02%     
==========================================
  Files         183      183              
  Lines       18630    18626       -4     
==========================================
+ Hits        13929    13930       +1     
+ Misses       4701     4696       -5

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

kevalmorabia97 · 2025-11-26T11:31:30Z

@inisis there is conflict installing with torch2.6

The conflict is caused by:
    torch 2.6.0 depends on sympy==1.13.1; python_version >= "3.9"
    onnxruntime-gpu 1.22.0 depends on sympy
    onnxslim 0.1.75 depends on sympy>=1.13.3

inisis · 2025-11-26T11:32:22Z

There is a sympy version conflict

The conflict is caused by:
    torch 2.6.0 depends on sympy==1.13.1; python_version >= "3.9"
    onnxruntime-gpu 1.22.0 depends on sympy
    onnxslim 0.1.75 depends on sympy>=1.13.3

kevalmorabia97 · 2025-11-26T11:33:22Z

Can onnxslim relax sympy required version?

inisis · 2025-11-26T11:34:42Z

@inisis there is conflict installing with torch2.6

The conflict is caused by:
    torch 2.6.0 depends on sympy==1.13.1; python_version >= "3.9"
    onnxruntime-gpu 1.22.0 depends on sympy
    onnxslim 0.1.75 depends on sympy>=1.13.3

Yes, I will check it asap, I don’t understand why PyTorch needs to pin SymPy to version 1.13.1.

inisis · 2025-11-26T11:39:29Z

@kevalmorabia97 the latest pytorch requires sympy>=1.13.3 https://github.com/pytorch/pytorch/blob/main/pyproject.toml#L47

There is also a conflict in onnxslim's CI, but it didn't break the pipeline.
https://github.com/inisis/OnnxSlim/actions/runs/19593268908/job/56114498429#step:4:52

…AutoQuantizeGradientSearcher; seperated quant modules and score modules (NVIDIA#586) ## What does this PR do? **Type of change:** Refator; Minor new feature **Overview:** ? 1. Refactored AutoQuantizeSearcher to _AutoQuantizeBaseSearcher & AutoQuantizeGradientSearcher - Prepares architecture for additional search methods. 2. seperated quant modules and score modules - separate quantization modules from scoring modules, enabling auto-quantization to measure sensitivity at parent layers (e.g., MLP output for MoE experts) rather than individual ops. 3. Also see NVIDIA#592 and NVIDIA#588 ## Testing See unittests; `tests/unit/torch/quantization/test_autoquant.py` and `tests/unit/torch/quantization/plugins/test_huggingface.py` ## Before your PR is "*Ready for review*"  - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes - **Did you write any new necessary tests?**: Yes - **Did you add or update any necessary documentation?**: Yes - **Did you update [Changelog](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CHANGELOG.rst)?**: Not Required ## Additional Information   ## Summary by CodeRabbit * **New Features** * Added support for score modules in quantization workflows. * Added optional naming for quantization recipes. * **Bug Fixes** * Improved quantization grouping rules documentation with clearer configuration examples. * **Refactor** * Renamed quantization module parameters for improved clarity. * Enhanced quantization search architecture for better scalability. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub>  --------- Signed-off-by: realAsma <[email protected]> Co-authored-by: Asma Kuriparambil Thekkumpate <[email protected]> Signed-off-by: inisis <[email protected]>

Signed-off-by: inisis <[email protected]>

kevalmorabia97 · 2025-11-26T18:00:23Z

/ok to test 3b1e46c

inisis requested review from a team as code owners October 28, 2025 11:49

inisis requested review from ajrasane and kevalmorabia97 October 28, 2025 11:49

inisis force-pushed the main branch from 14db274 to ad1f940 Compare October 28, 2025 11:52

kevalmorabia97 requested a review from gcunhase October 28, 2025 13:41

i-riyad approved these changes Oct 28, 2025

View reviewed changes

gcunhase reviewed Oct 28, 2025

View reviewed changes

setup.py Outdated Show resolved Hide resolved

inisis force-pushed the main branch from b270070 to abe6de4 Compare October 28, 2025 17:19

gcunhase reviewed Nov 21, 2025

View reviewed changes

modelopt/onnx/quantization/quantize.py Outdated Show resolved Hide resolved

inisis mentioned this pull request Nov 22, 2025

fix: issues related to TensorRT-Model-Optimizer inisis/OnnxSlim#221

Merged

gcunhase approved these changes Nov 25, 2025

View reviewed changes

gcunhase reviewed Nov 25, 2025

View reviewed changes

setup.py Outdated Show resolved Hide resolved

gcunhase reviewed Nov 25, 2025

View reviewed changes

setup.py Outdated Show resolved Hide resolved

kevalmorabia97 reviewed Nov 25, 2025

View reviewed changes

modelopt/onnx/quantization/quantize.py Outdated Show resolved Hide resolved

kevalmorabia97 reviewed Nov 25, 2025

View reviewed changes

setup.py Outdated Show resolved Hide resolved

kevalmorabia97 reviewed Nov 25, 2025

View reviewed changes

setup.py Show resolved Hide resolved

inisis force-pushed the main branch from 81a1145 to 32dd66c Compare November 25, 2025 21:39

kevalmorabia97 approved these changes Nov 26, 2025

View reviewed changes

kevalmorabia97 changed the title ~~feat: add onnxslim support~~ Replace ONNX simplification package from onnxsim to onnxslim Nov 26, 2025

inisis mentioned this pull request Nov 26, 2025

fix: relax sympy version inisis/OnnxSlim#224

Merged

realAsma and others added 8 commits November 26, 2025 21:04

feat: add onnxslim support

671bbbb

Signed-off-by: inisis <[email protected]>

Update CHANGELOG.rst

0bf5b06

Signed-off-by: inisis <[email protected]>

Update tox.ini

ff3ffc8

Signed-off-by: inisis <[email protected]>

Update tests.yml

0e07446

Signed-off-by: inisis <[email protected]>

Update setup.py

ba5a125

Signed-off-by: inisis <[email protected]>

Update quantize.py

f2d3412

Signed-off-by: inisis <[email protected]>

pin minimal onnxslim version to 0.1.76

18c27dd

Signed-off-by: inisis <[email protected]>

inisis force-pushed the main branch 2 times, most recently from 77d6bf9 to 18c27dd Compare November 26, 2025 13:07

inisis requested review from a team as code owners November 26, 2025 13:07

inisis requested a review from realAsma November 26, 2025 13:07

Merge branch 'main' into main

3b1e46c

kevalmorabia97 removed request for a team and realAsma November 26, 2025 17:45

kevalmorabia97 merged commit 261858c into NVIDIA:main Nov 26, 2025
27 checks passed

Replace ONNX simplification package from onnxsim to onnxslim #478

Replace ONNX simplification package from onnxsim to onnxslim #478

Conversation

inisis commented Oct 28, 2025 • edited by gcunhase Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Uh oh!

copy-pr-bot bot commented Oct 28, 2025

Uh oh!

Uh oh!

inisis commented Nov 1, 2025

Uh oh!

gcunhase commented Nov 3, 2025

Uh oh!

gcunhase commented Nov 6, 2025

Uh oh!

inisis commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Env

Without simplify

With onnxsim

With onnxslim

Uh oh!

gcunhase commented Nov 11, 2025

Uh oh!

inisis commented Nov 11, 2025

Uh oh!

gcunhase commented Nov 11, 2025

Uh oh!

inisis commented Nov 18, 2025

Uh oh!

gcunhase commented Nov 21, 2025

1. Functional failures

Error logs

How to repro

2. ORT inference failures

Error logs

How to repro

3. ORT numerical accuracy failures

Error logs

How to repro

Uh oh!

Uh oh!

inisis commented Nov 22, 2025

1. Functional failures

2. ORT inference failures

3. ORT numerical accuracy failures

Uh oh!

gcunhase left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gcunhase commented Nov 25, 2025

Uh oh!

Uh oh!

kevalmorabia97 left a comment

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

kevalmorabia97 commented Nov 26, 2025

Uh oh!

inisis commented Nov 26, 2025

Uh oh!

kevalmorabia97 commented Nov 26, 2025

Uh oh!

inisis commented Nov 26, 2025

Uh oh!

inisis commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

inisis commented Oct 28, 2025 •

edited by gcunhase

Loading

inisis commented Nov 11, 2025 •

edited

Loading

codecov bot commented Nov 26, 2025 •

edited

Loading

inisis commented Nov 26, 2025 •

edited

Loading